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PATENT 

Attorney Docket No. 42.P13063 
BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES cmmL FA X CENTER 

JAM 2 0 2695 

In re Patent Application of 
Michael Yudkowsky 
Application No.: 10/038,409 
Filed: January 3, 2002 

For: NETWORK-ACCESSIBLE SPEAKER- 
DEPENDENT VOICE MODELS OF MULTIPLE 
PERSONS 

APPEAL BRIEF 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 

Sir: 

Appellant submits herewith an Appeal Brief as required by 37 CJ.R. § 41.37- This 
Appeal Brief is in response to the Final Office Action dated August 26, 2004 and the Advisory 
Action dated November 4, 2004. 

I. REAL PARTY IN INTEREST 
The real party in interest is Intel Corporation, a corporation of Delaware. 

II. RELATED APPEALS AND INTERFERENCES 
There are no other appeals or interferences known to Appellant which relate to, directly 

affect or are directly affected by the Board's decision in this appeal. 
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HI. STATUS OF THE CLAIMS : 

Claims 1-30 are pending in the application. These claims are reproduced in the attached 

Appendix. 

Claims 1-12 and 14-30 stand finally rejected under 35 U.S.C. § 102(b) overGoronzy 
aL (EP 1022725 Al). Claim 13 stand* finally rejected under 35 U.S.C. § 103(a) overGoronzy et 
aL in view of Ellis et al. ('Tandem Acoustic Modeling in Large Vocabulary Recognition/' IEEE 
Conference on Acoustics, Speech, and Signal Processing, 2001). 

The rejections of claims 1-30 are appealed. 

IV. STATUS OF AMENDMENTS : 

An Amendment After Final under 37 C.F.R. § 1.116 was filed on September 30, 2004, 
with a single proposed amendment to correct a typographical error in claim 26. Because this 
proposed amendment does not affect this appeal and improves the fonn of the claims, the 
Amendment should be or should have been entered. 

V. SUMMARY OF THE INVENTION : 

Regarding independent claims 1, 18 and 28, telephony system 300 may determine an 
identity of a speaker through a network (pg. 1 1, lines 13-15; Fig. 2, element 205; and page 7, 
lines 9-12). Output data including identification information may be provided over the network 
to one or more speech-recognition systems (page 7, lines 12-21). See also page 7, lines 9 and 10, 
and page 6 y lines 7-16, for further discussion of the network. SIP server 340 and/or voice model 
database server 350 may attempt to locate, based on the identity of the speaker, a voice model for 
the speaker (Fig- % element 210; and page 7, line 22 through page 8, line 7). Voice model 
database server 350 may retrieve the voice model 351 for the speaker from a storage area if the 
voice model for the speaker is located (page 11, lines 14-18). 

Regarding independent claim 15, in addition to the above, SIP server 340 may connect 
telephone 320 over the network to voice model database server 350 (Fig. 2, element 220; page 
11, lines 19 and 20). Voice model database server 350 may prompt the caller 310 to provide an 
utterance 330 (Fig. 2, element 225; page 11, lines 20 and 21). Caller 310 may speak the 
utterance 330 (Fig. 2 7 element 230), and voice model database server 350 may receive the 

-2- 



PAGE 6/20 * RCVD AT 1120/2005 3:15:01 PM [Eastern Standard Time] * SVR:USPT0-EFXRF-1/2 * DNIS:8729306 * CSID:7036333303 * DURATION (mm-ss):0544 



Jan-20-2005 0? :18pm . Frora-LF3 OFFICE AREA 



7036333303 



T-387 P. 007/020 F-277 



Attorney Docket No: 42390.P13063 
Application No. 10/038,409 

utterance (Pig. 2, element 235; page 11, lines 21 and 22). Voice model database server 350 may 
use speaker-dependent voice model 351 to extract phonemes from the utterance 330 (Fig. 2, 
element 240; page 1 1, lines 22 and 23). Voice model database server 350 may transmit the 
phonemes 352 over the network to a speech-recognition system 365 (Fig- 2, element 245), and 
Speech-recognition system 365 may use phonemes 352 to determine a content of the utterance 
330 (Fig. 2, element 250; page 11, lines 22 and 23). 

VI. GROUNDS OF REJECTION: 

A. Claims 1-12 and 14-30 stand rejected under 35 U.S.C. § 102(b) over Goronzy et 
aL(EP 1022725 Al). 

B. Claim 13 stands rejected under 35 U.S.C. § 103(a) over Goronzy et al. in view of 
Ellis et gJL ('Tandem Acoustic Modeling in Large Vocabulary Recognition," 
IFFR Conference on Acoustics, Speech, and Signal Processing, 2001). 

VII. ARGUMENT : 

A. Claims 1-12 and 14-30 are patentable under 35 U.S.C . S 102ftrt over Goronzy et 

ah 

1. Claims 1-12. 14. and 18-30: 

Appellant respectfully traverses the § 102(b) rejection of independent claims 1, 18, and 
28 over Goronzy et al. . Claims 1, 18, and 28 require a method, article of manufacture, and 
apparatus including, inter alia, "determining] an identity of a speaker through a network over 
which output data including identification information is provided to one or more speech- 
recognition systems." Goronzy etal. fails to disclose at least the above quoted element of 
independent claims 1, 18, and 28. 

With regard to these claims, page 6 of the Final Office Action alleges that the above- 
quoted claim language is met because "the networked system (col. 2, line3 2-3) checks the 
identity of the speaker every time the speaker changes, which requires use of some form of 
identification information to output to the verification module (4)," Page 3 of the Advisory 
Action adds that "the teachings of Goronzy of a networked system at column 3, lines 2-3, 
provide adequate support for applicant's claimed 'network' limitation." 

-3- 
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a. Goronzv et al. docs not teach "a network over which output data 
including identif ication information is provided to one or more 
speech-recogn ition systems." 

Goronzv et ah contains, in its entirety* exactly two variations on the word "network." 
The first, "state transition network," appears in the middle of paragraph 0002 and is not relevant 
to this appeal. The second mention of u a networked system" occurs at col. 3, lines 2 and 3, (in 
paragraph 0017) in a summary portion of Goronzy et aL , Notably, this term "networked system" 
occurs well before the description of Figs, 1 and 2 of Goronzv et al. . and at most discloses that 
the system in Figs. 1 and 2 may be coupled to a network (i.e., "networked"). It is upon this 
single term in the summary of Goronzv et al. that the Examiner rests the whole § 102(b) rejection 
of independent claims 1,18, and 28. This is wholly improper, as will be explained below. 

M.P.E.P. § 2131 and Federal Circuit case law require that for anticipation, "The elements 
must be arranged as required by the claim, but this is not an ipsissimis verbis test, i.e., identity of 
terminology is not required. In re Bond, 910 R2d 831, 15 USPQ2d 1566 (Fed. Or. 1990)." 
Thus, to anticipate under § 102, a reference must contain all claim elements arranged as required 
by the claim. Because the tenn "a networked system" in paragraph 0017 does not, by itself, 
teach any specific arrangement of elements, any anticipatory teachings in Goronzv et al. must be 
found in Fig. 1 , if at all. 

Fig. 1 of Goronzv et al. shows a device, or portion of a system, whose circuit components 
are connected by typical, point-to-point electrical connections. For example, col. 3, lines 39-42, 
of Goronzv et al. discloses that microphone 1 inputs an analog signal to AID converter 2, which 
inputs a corresponding digital signal to feature extraction module 3. Because an analog signal 
from microphone 1 to A/D converter 2 is typically transferred via an electrically conductive wire 
or trace, all arrows in Fig. 1 similar to that connecting components 1 and 2 are also, by 
implication, mere electrical connections. This conclusion is further supported by col. 4, lines 1- 
6, which discloses that modules 5 and 6 are selectively connected to storages 7-10 via "a switch 
1 1" chat is controlled by "a control signal" from module 4. Such a "switch," without more, 
discloses only a typical circuit component connected via electrical connections. 

A cursory look at Fig. 1 will also reveal that this system appears to be a self-contained. 
No "network" type connections (e.g., communication interfaces) appear to be present in Fig. 1. 

-4- 
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Paragraph 0019 of Goronzv et al. confirms these observations by teaching that ''Figure 1 shows 
only the part of the automatic speech recognition system . . . that is used for speaker adaptation 
and automatic identification of the speaker." (col. 3, lines 35-38 (emphasis added)). The 
implication of this statement is that the system Fig. 1 only shows and teaches a self-contained 
speaker identification system, and does not disclose a network or other circuitry to which this 
system may be coupled. 

Thus, notwithstanding the mention of "a networked system" in paragraph 0017, neither 
Fig. i of Goronzv et at or its associated description explicitly teach a "network" as claimed. To 
the contrary, at least paragraph 0019 implies that Fig* 1 excludes such a network. Further, a 
device with connected circuit components as illustrated in Fig. 1 of Goronzv et al. cannot 
reasonably be said to include "a network" as claimed. Because Fig* 1 of Goronzv et al. fails to 
teach, either explicitly or implicitly, a network as claimed, the rejection of claims 1, 18, and 28 is 
improper. 

Even if Fig. 1 of Goronzv et_aL did include a network, it does not teach a network "over 
which output data including identification information is provided to one or more speech- 
recognition systems" as claimed. Goronzv et_al. at col. 3, lines 45 and 46 in paragraph 0020, 
discloses that "In the verification module 4, an automatic identification of the speaker is 
performed." Goromvet al. also discloses in the next paragraph 0021 that verification module 4 
selects among different model sets 7-10 "via a switch 11." Thus, verification module 4 provides 
identification information only to switch 11, and not to a speech-recognition system (e,g., 
recognition module 5) as set forth in claims 1, 18, and 28. Switch 11 does not reasonably 
correspond to the claimed network; nor does switch 1 1 provide any input identification 
information to recognition module 5. Thus, Goron2V et al. fails to teach a network "over which 
output data including identification information is provided to one or more speech-recognition 
systems" as claimed, and the rejection of claims 1, 18, and 28 is improper for this additional 
reason. 

b. Goronzv does not teach "determining an identity of a speaker 
through a network." 

Goronzv et al. at col. 3 t lines 45 and 46 discloses that "In the verification module 4, an 
automatic identification of the speaker is performed/* Thus, Goronzv etal. explicitly teaches 

-5- 
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that identification of a speaker is performed solely by and within verification module 4. This 
teaching of identification by verification module 4 plainly does not meet the additional claimed 
limitation of determining an identity of a speaker "through a network." Verification module 4 
does not correspond to a network. Nor does the signal traveling from microphone 1 to A/D 
converter 2 to feature extraction module 3 to verification module 4 reasonably correspond to 
"determining an identity . . . through a network" as claimed. The rejection of claims 1, 18, and 
28 is improper for this additional reason. 

For at least these reasons, Goronzv et al fails to disclose all elements of independent 
claims 1,18, and 28, either explicitly or implicitly. The § 102(b) rejection of these claims 
remains improper and should be reversed. 

Dependent claims 2-12, 14, and 18-30 are allowable at least by virtue of their respective 
dependence from claims 1, 18, and 28. 

2. Claims 15-17: 
Appellant respectfully traverses the § 102(b) rejection of independent claim 15 over 

Goronzv et al. . Claim 15 requires a method including, inter alia, "accessing by a speaker a 

network containing a speech recognition system/' Goronzv et aL fails to disclose at least the 

above quoted element of independent claim 15. 

As explained above with regard to claims 1, 18, and 28, Figs. 1 and 2 of GorQnsy et al. do 

not disclose a network — just a device or portion of a system- 
Even if Goronzv et al. taught a network, the only thing that Goron2v et al. teaches a 

speaker "accessing" is microphone 1 (see col. 3, lines 39 and 40). Such microphone 1 cannot be 

reasonably considered to be a network or part of a network; it is merely an input device coupled 

to verification module 4. At most, Goronzv etaL teaches a speaker directly accessing a portion 

(i.e., microphone 1) of a speech recognition system. 

Claim 15 requires, however, "accessing ... a network containing a speech recognition 

system," and not directly accessing the speech recognition system itself. Thus, Goronzv etal. 

fails to disclose accessing a network by a speaker, as required by claim 15. The rejection of 

claim 15 is thus improper. 
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Dependent claims 16 and 17 are allowable at least by virtue of their dependence from 
claim 15. 

B. Claim 13 is patentable under ; 35 U.S.C. § 103(a) over Goronzv et ah in view of 
Ellis et al. 

In addition to the reasons given above in section VH(A)(l) f claim 13 is allowable for the 
following reasons. 

To establish a prima facie case of obviousness, three basic criteria must be met. First, 
there must be some suggestion or motivation, either in the references themselves or in the 
knowledge generally available to one of ordinary skill in the art, to modify the reference or to 
combine reference teachings. Second, there must be a reasonable expectation of success. 
Finally, the prior art reference (or references when combined) must teach or suggest all the claim 
limitations. See M.P.E.P. § 2143. 

Regarding dependent claim 13, the addition of Ellis et aL . even if proper* fails to cure the 
deficiencies of Goronzv et al. explained above. Ellis _et_aL also fails to teach or suggest the 
above-quoted element of the method recited in independent claim 1. The Final Office Action 
does not allege that Ellis et al. teaches or suggests the claim element at issue. Hence, a prima 
facie case of obviousness has not been established for dependent claim 13 7 because the 
combination of references fails to teach or suggest all elements of this dependent claim. 

A prima facie case of obviousness also has not been established for claim 13, at least 
because no motivation has been provided to combine Goronzy et ah and Ellis et al. The 
proposed justification on page 5 of the Final Office Action, "providing a recognition system that 
is able to recognize data in noisy backgrounds,'* is conclusory and devoid of citation to either 
reference. Such a bare conclusion does not establish a prima facie case of obviousness without 
some evidence supporting that conclusion. No reasoning, in the references or otherwise, has 
been provided detailing what deficiency or need in the system of Goronzv et aL would have 
motivated one of ordinary skill in the art to add the teachings of Ellis etaL In the absence of 
such evidence, a prima facie case of obviousness has not been established for claim 13. 

A prima facie case of obviousness also has not been established, because at least 
Goronzy et al. teaches away from die proposed combination. See M.P.E.P. § 2145(X)(D) 
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( <4 proposed modification cannot render the prior art unsatisfactory for its intended purpose or 
change the principle of operation of a reference"). As is apparent from the discussion of 
Goronzv et al. above, the reference teaches a scheme for locally detecting and identifying 
different speakers. This is all performed by a wired microphone 1, for which there is no 
indication that noise is a problem- To add Aurora feature extraction would change the principle 
of operation (e.g., multiple user identification and use) of Goronzv et aL This being the case, at 
least Goronzv et al. teaches away, by its principles and goals, from the proposed combination. A 
prima facie case of obviousness has not been established for claim 13 on this additional ground. 

CONCLUSION 

For the reasons set forth above. Appellant respectfully solicits the Honorable Board to 
reverse the Examiner's rejection of claims 1-30. 

To the extent necessary, a petition for an extension of time under 37 C.F.R. § 1.136 is 
hereby made. Please charge any shortage in fees due in connection with the filing of this paper, 
including extension of time fees, to Deposit Account No. 50-0221 and please credit any excess 
fees to such deposit account. 



c/o Intel Americas 
LF3 

4030 Lafayette Center Drive 
Chantilly, VA 20151 
(703) 633-1061 



Respectfully submitted, 



Dated: January 20. 2005 




Alan Pedersen-Giles 
Registration No. 39,996 
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VIIL CLAIMS APPENDIX 

1. (previously presented) A method, comprising: 

determining an identity of a speaker through a network over which output data including 
identification information is provided to one or more speech-recognition systems; 

attempting to locate, based on the identity of the speaker, a voice model for the speaker, 

and 

retrieving from a storage area the voice model for the speaker if the voice model for the 
speaker is located. 

2. (original) The method of claim 1, wherein the voice model comprises a speaker- 
dependent voice model. 

3. (previously presented) The method of claim 2, wherein determining Che identity 
of the speaker over the network comprises using identification information received from the 
speaker over the network to determine the identity the speaker. 

4. (original) The method of claim 2 7 wherein determining the identity of the speaker 
over the network comprises: 

receiving from a device in the network identifying data regarding the speaker, and 
determining the identity of the speaker based on the identifying data regarding the 
speaker. 

5. (original) The method of claim 2, wherein the storage area comprises an internal 
storage area containing speaker-dependent voice models for multiple persons. 

6. (original) The method of claim 2, wherein the storage area comprises an external 
storage area accessible over the network, 

7. (original) The method of claim 2, wherein the output data comprise phonemes. 

-9- 
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8. (original) The method of claim 7, further comprising: 
receiving an utterance from the speaker; 

using the voice model to extract phonemes from the utterance; and 
transmitting the phonemes over the network to the speech-recognition system. 

9. (original) The method of claim 8, wherein the utterance comprises one or both of 
vocalized words and vocalized sounds. 

10. (original) The method of claim 9, further comprising: 

receiving from the speech-recognition system contents of a recognized utterance of the 
speaker, and 

revising the voice model for the speaker based on the contents of the recognized 
utterance. 

11. (original) The method of claim 2, wherein the output data comprise a voice 
model for the speaker. 

12. (original) The method of claim 1 1 , further comprising transmitting the voice 
model over the network to the speech-recognition system. 

13. (original) The method of claim 2, further comprising 
receiving Aurora features extracted from an utterance of the speaker, 
extracting phonemes from the Aurora features; and 

transmitting the phonemes over the network to a speech recognition system, 

14. (original) The method of claim 2, further comprising: 

retrieving a speaker-independent voice model if failing to locate the voice model for the 
speaker; 

receiving an utterance from the speaker; 
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using the speaker-independent voice model to extract phonemes from the utterance; 
transmitting the phonemes over the network to a speech-recognition system; 

receiving from the speech-recognition system contents of a recognized utterance of the 
speaker; and 

generating a voice model for the speaker based on the contents of the recognized 
utterance, 

15. (original) A method, comprising: 

accessing by a speaker a network containing a speech recognition system; 

identifying by a first device the speaker based on information provided by the speaker; 

requesting by the first device a speaker-dependent voice model for the speaker from a 
voice model database server providing phonemes to any speech recognition system in the 
network; 

retrieving by the voice model database server the speaker-dependent voice model from a 
storage area if the voice model database server locates a speaker-dependent voice model for the 
speaker; 

connecting by the first device the speaking device with the voice model database server; 
prompting by the voice model database server the speaker to provide an utterance; 
speaking by the speaker the utterance into the speaking device; 
receiving by the voice model database server the utterance; 

using by the voice model database server the speaker-dependent voice model to extract 
phonemes from the utterance; 

transmitting by the voice model database server the phonemes over the network to a 
speech-recognition system; and 

using by the speech-recognition system the phonemes to determine a content of the 
utterance. 
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1 6. (original) The method of claim 1 5, wherein the storage area comprises a storage 
area within the voice model database server containing speaker-dependent voice models for 
multiple persons. 

17. (original) The method of claim 15, wherein the storage area comprises a storage 
area accessible by the voice model database server over the network. 

18. (previously presented) An article of manufacture comprising: 

a machine-accessible medium including thereon sequences of instructions that, when 
executed, cause one or more machines to: 

determine an identity of a speaker through a network over which output data is 
provided to one or more speech-recognition systems; 

attempt to locate, based on the identity of the speaker, a voice model for the 

speaker; and 

retrieve from a storage area the voice model for the speaker if the voice model for 
the speaker is located 

19. (original) The article of manufacture of claim 18, wherein the sequences of 
instructions that, when executed, cause the one or more machines to attempt to locate, based on 
the identity of the speaker, the voice model for the speaker, comprise sequences of instructions 
that, when executed, cause the one or more machines to attempt to locate, based on the identity 
of the speaker, a speaker-dependent voice model for the speaker. 

20. (original) The article of manufacture of claim 19, wherein the sequences of 
instructions that, when executed, cause the one or more machines to retrieve from the storage 
area the voice model for the speaker if the voice model for the speaker is located comprise 
sequences of instructions that, when executed, cause the one or more machines to retrieve from 
an internal storage area containing speaker-dependent voice models for multiple persons the 
voice model for the speaker if the voice model for the speaker is located. 



- 12- 



PAGE 16/20 * RCVD AT 1/2012005 3:15:01 PM (Eastern Standard Time] * SVR:USPT0-EFXRF-1/2 * DNIS:8729306 * CSID:7036333303 * DURATION (mm-ss):0544 



Jao-20-2005 03:21pm . Frora-LF3 OFFICE AREA 



7036333303 



T-387 P. 01 7/020 F-277 



Attorney Docket No: 42390 J>13063 
Application No. 10/038,409 

21. (original) The article of manufacture of claim 19, wherein the sequences of 
instructions that, when executed, cause the one or more machines to retrieve from the storage 
area the voice model for the speaker if the voice model for the speaker is located comprise 
sequences of instructions that, when executed, cause the one or more machines to retrieve from 
an external storage area accessible over the network the voice model for the speaker. 

22. (original) The article of manufacture of claim 19, wherein the sequences of 
instructions that, when executed, cause the one or more machines to determine the identity of the 
speaker thiough the network over which the output data, regarding the person with access to the 
speech-recognition system receiving the output data, is provided to the one or more speech- 
recognition systems comprise sequences of instructions that, when executed, cause the one or 
more machines to determine the identity of the speaker through the network over which 
phonemes to the one or more speech-recognition systems is provided regarding the person with 
access to the speech-recognition system receiving phonemes. 

23. (original) The article of manufacture of claim 22, wherein the machine-accessible 
medium further comprises sequences of instructions that, when executed, cause the one or more 
machines to: 

receive an utterance from the speaker; 

use the voice model to extract phonemes from the utterance; and 
transmit the phonemes over the network to a speech-recognition system. 

24. (original) The article of manufacture of claim 23, wherein the machine-accessible 
medium further comprises sequences of instructions that, when executed, cause the one or more 
machines to: 

receive from a speech-recognition system contents of a recognized utterance of the 
speaker; and 

revise the voice model for the speaker based on the contents of the recognized utterance. 

25. (original) The article of manufacture of claim 19, wherein the sequences of 
instructions that, when executed, cause the one or more machines to determine the identity of the 
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speaker through the network over which the output data, regarding the person with access to the 
speech-recognition system receiving the output data, is provided to the one or more speech- 
recognition systems comprise sequences of instructions that, when executed, cause the one or 
more machines to determine the identity of the speaker through the network over which the voice 
model regarding the person to the one or more speech-recognition systems is provided regarding 
the person with access to the speech-recognition system receiving the voice model regarding the 
person. 

26. (previously presented) The article of manufacture of claim 19, wherein the 
machine-accessible medium further comprises sequences of instructions that, when executed, 
cause the one or more machines to transmit the voice model over the network to a speech- 
recognition system. 

27. (original) The article of manufacture of claim 26, wherein the machine-accessible 
medium further comprises sequences of instructions that, when executed, cause the one or more 
machines to: 

retrieve a speaker-independent voice model if failing to locate the voice model for the 
speaker; 

receive an utterance from the speaker; 

use the speaker-independent voice model to extract phonemes from the utterance; 
transmit the phonemes over the network to a speech-recognition system; 
receive from the speech-recognition system contents of a recognized utterance of the 
speaker; and 

generate a voice model for the speaker based on the contents of the recognized utterance. 

28. (previously presented) An apparatus, comprising: 

an identification determiner to determine an identification of a speaker through a network 
over which output data including identification information is provided to one or more speech- 
recognition systems; 
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a voice-model locator to locate a speaker-dependent voice model for the speaker based on 
the identity of the speaker; and 

a voice-model retriever to retrieve the speaker-dependent voice model for the speaker 
from a storage area based on the identity of the speaker. 

29. (original) The apparatus of claim 28, further comprising: 
an utterance receiver to receive an utterance from the speaker; 

a phoneme extractor to extract phonemes from the utterance using the speaker-dependent 
voice model; and 

a phoneme transmitter to transmit the phonemes over the network to a speech-recognition 

system. 

30. (previously presented) The apparatus of claim 28, further comprising: 

a recognized-utterance receiver to receive from a speech-recognition system contents of a 
jecognized utterance of the speaker; and 

a voice model reviser to revise the speaker-dependent voice model of the speaker based 
on the contents of the recognized utterance. 
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DC. EVIDENCE APPENDIX 
None. 

X. RELATED P ttfinKEHI NGS APPENDIX 
None. 
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